home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Games of Daze
/
Infomagic - Games of Daze (Summer 1995) (Disc 1 of 2).iso
/
x2ftp
/
msdos
/
docs
/
winer
/
chap9.txt
< prev
next >
Wrap
Text File
|
1994-09-01
|
47KB
|
1,058 lines
CHAPTER 9
PROGRAM OPTIMIZATION
Throughout the preceding chapters I have shown a variety of tips and
techniques that can help to improve the efficiency of your programs. For
example, Chapter 6 explained that processing files in large pieces reduces
the time needed to save and load data. Likewise, Chapter 8 discussed the
improvement that SWAP often provides over conventional assignments. Some
optimizations, however, do not fit into any of the well-defined categories
that have been used to organize this book. In this chapter I will share
several general optimization techniques you can employ to reduce the size
of your programs and make them run faster.
The material in this chapter is organized into three principle
categories: programming shortcuts and speed improvements, miscellaneous
tips and techniques, and benchmarking. Each section addresses BASIC
programming ideas and methods that are not immediately obvious in most
cases.
PROGRAMMING SHORTCUTS AND SPEED IMPROVEMENTS
============================================
Chapter 3 discussed the use of AND, OR, and the other logical operations
that can be used for both logical (IF and CASE) tests and also bit
operations. But there are a few other related points that are worth
mentioning here. When you need to know if a variable is zero or not, you
can omit an explicit test for zero like this:
IF Variable THEN...
You might be tempted to think that two variables could be tested for non-
zero values at one time in the same way, using code such as this:
IF Var1 AND Var2 THEN...
However, that will very likely fail. The expression Var1 AND Var2 combines
the bits in these variables, which could result in a value of zero even
when both variables are non-zero. As an example, if Var1 currently holds
a value of 1, its bits will be set as follows:
0000 0000 0000 0001
Now, if Var2 is assigned the value 2, its bits will be set like this:
0000 0000 0000 0010
Since no two bits are set in the same position in each variable, the result
of Var1 AND Var2 is zero. An effective solution is IF Var1 * Var2 THEN to
ensure that neither variable is zero. And to test if either variable is
non-zero you'd use OR. Whatever follows the test IF Var1 OR Var2 THEN will
be executed as long as one (or both) variables are not zero. These are
important short cuts to understand, because the improvement in code size
and execution speed can be significant.
Each of the AND, OR, and multiplication tests shown here generates only
11 bytes of code. Contrast that to the 28 bytes that BC creates for the
alternative: IF Var1 <> 0 AND Var2 <> 0 THEN. Because of the improved
method of expression evaluation in BASIC PDS, this last example generates
only 14 bytes when using that version of BC. None the less, if you can
avoid explicit comparisons to zero you will go a long way toward improving
the efficiency of your code.
This short cut is equally appropriate with LOOP comparisons as well as
IF tests. In the BufIn function shown in Chapter 6, INSTR was used to see
if a CHR$(13) carriage return was present in the buffer. In the statement
CR = INSTR(BufPos, Buffer$, CR$), CR receives either 0 if that character
is present, or a non-zero position in the string where it was found. The
LOOP statement that surrounded the buffer searching uses LOOP WHILE CR,
which continues looping as long as CR is not zero.
When an integer variable is compared in a LOOP WHILE condition, seven
bytes of code are generated whether it is compared to zero or not. But
when a long integer is used to control the LOOP WHILE condition, omitting
the explicit test for zero results in 11 bytes of compiled code where
including it creates 20 bytes. Note that with floating point values
identical code is generated in either case, because an explicit comparison
to zero is required and added by the compiler.
PREDEFINING VARIABLES
Another important point is illustrated in the same code fragment that uses
INSTR to search for a carriage return. There, the CR$ string variable had
been assigned earlier to CHR$(13). Although the BufIn code could have used
CR = INSTR(BufPos, Buffer$, CHR$(13)) instead of a previously defined
string variable to replace the CHR$(13), that would take longer each time
the statement is executed. Since CHR$ is a function, it must be called
each time it is used. If CR$ is defined once ahead of time, only its
address needs to be passed to INSTR. This can be done with four bytes of
assembly language code.
If CHR$(13) will be used only once in a program, then the only savings
afforded by predefining it will be execution speed. But when it is needed
two or more times, several bytes can be saved at each occurrence by using
a replacement string variable. Other common CHR$ values that are used in
BASIC programs are the CHR$(34) quote character, and CHR$(0) which is often
used when accessing DOS services with CALL Interrupt.
Likewise, you should avoid calling any functions more than is absolutely
necessary. I have seen many programmers use code similar to the following,
to see if a drive letter has been given as part of a file name.
IF INSTR(Path$, ":") THEN
Drive$ = LEFT$(INSTR(Path$, ":") - 1)
END IF
A much better approach is to invoke INSTR only once, and save the results
for subsequent testing:
Found% = INSTR(Path$, ":") 'save the result from INSTR
IF Found% THEN
Drive$ = LEFT$(Path$, Found%) - 1)
END IF
The same situation holds true for UCASE$, MID$, and all of the other BASIC
functions. Rather than this:
IF INSTR(UCASE$(MID$(Work$, 3, 22)), "/A") THEN A = True
IF INSTR(UCASE$(MID$(Work$, 3, 22)), "/B") THEN B = True
IF INSTR(UCASE$(MID$(Work$, 3, 22)), "/C") THEN C = True
Instead use this:
Temp$ = UCASE$(MID$(Work$, 3, 22))
IF INSTR(Temp$, "/A") THEN A = True
IF INSTR(Temp$, "/B") THEN B = True
IF INSTR(Temp$, "/C") THEN C = True
Where the first example generates 138 bytes of code, the second uses only
111. The time savings will be even more significant, because BASIC's
UCASE$ and MID$ functions allocate and deallocate memory by making further
calls to BASIC's string memory management routines.
Indeed, it is always best to avoid creating new strings whenever
possible, precisely because of the overhead needed to assign and erase
string data. Each time a string is assigned, memory must be found to hold
it; add to that the additional code needed to release the older, abandoned
version of the string.
This has further ramifications with simple string tests as well. As
Chapter 3 explained, testing for single characters or the first character
in a string is always faster if you isolate the ASCII value of the
character first, and then use integer comparisons later. In the example
below, the first series of IF tests generates 60 bytes of code. This is
much less efficient than the second which generates only 46, even though
the steps to obtain and assign the ASCII value of Answer$ comprise 12 of
those bytes.
PRINT "Abort, Retry, or Fail? (A/R/F) ";
DO
Answer$ = UCASE$(INKEY$)
LOOP UNTIL LEN(Answer$)
'----- Method 1:
IF Answer$ = "A" THEN
REM
ELSEIF Answer$ = "R" THEN
REM
ELSEIF Answer$ = "F" THEN
REM
END IF
'----- Method 2:
A% = ASC(Answer$)
IF A% = 65 THEN
REM
ELSEIF A% = 82 THEN
REM
ELSEIF A% = 70 THEN
REM
END IF
Another prime candidate for speed enhancement is when you need to create
a string from individual characters. The first example below reads the 80
characters in the top row of display memory, and builds a new string from
those characters.
Scrn$ = ""
FOR X = 1 TO 80
Scrn$ = Scrn$ + CHR$(SCREEN(1, X))
NEXT
Since we already know that 80 characters are to be read, a much better
method is to preassign the destination string, and insert the characters
using the statement form of MID$, thus:
Scrn$ = SPACE$(80)
FOR X% = 1 TO 80
MID$(Scrn$, X%, 1) = CHR$(SCREEN(1, X%))
NEXT
An informal timing test that executed these code fragments 100 times using
QuickBASIC 4.5 showed that the second example is nearly twice as fast as
the first. Moreover, since BASIC's SCREEN function is notoriously slow,
the actual difference between building a new string and inserting
characters into an existing string is no doubt much greater.
INTEGER AND LONG INTEGER ASSIGNMENTS
Another facet of compiled BASIC that is probably not immediately obvious
is the way that integer and long integer assignments are handled by the
compiler. When many variables are to be assigned the same value--perhaps
cleared to zero--it is often more efficient to assign one of them from that
value, and then assign the rest from the first. To appreciate why this is
so requires an understanding of how BASIC compiles such assignments.
Normally, assigning an integer or long integer variable from a numeric
constant requires the same amount of code as assigning from another
variable. The BASIC statement X% = 1234 is compiled to the following 6-
byte assembly language statement.
C7063600D204 MOV WORD PTR [X%],1234
Assigning the long integer variable Y& requires two such 6-byte
instructions--one for the low word and another for the high word:
C7063600D204 MOV WORD PTR [Y&],1234 ;assign the low word
C70638000000 MOV WORD PTR [Y&+2],0 ;then the high word
The 80x86 family of microprocessors does not have direct instructions for
moving the contents of one memory location to another. Therefore, the
statement X% = Y% is compiled as follows, with the AX register used as an
intermediary.
A13800 MOV AX,WORD PTR [Y%] ;move Y% into AX
A33600 MOV WORD PTR [X%],AX ;move AX into X%
Assigning one long integer from another as in X& = Y& is handled similarly:
A13A00 MOV AX,WORD PTR [Y&] ;move AX from Y& low
8B163C00 MOV DX,WORD PTR [Y&+2] ;move DX from Y& high
A33600 MOV WORD PTR [X&],AX ;move X& low from AX
89163800 MOV WORD PTR [X&+2],DX ;move X& high from DX
You may have noticed that instructions that use the AX registers require
only three bytes to access a word of memory, while those that use DX (or
indeed, any register other than AX) require four. But don't be so quick
to assume that BASIC is not optimizing your code. The advantage to using
separate registers is that the full value of Y& is preserved. Had AX been
used both times, the low word would be lost when the high word was
transferred from Y& to X&.
When assigning one variable to many in a row, BASIC is smart enough to
remember which values are in which registers, and it reuses those values
for subsequent assignments. The combination BASIC and assembly language
code shown below was captured from a CodeView session and edited slightly
for clarity. It shows the actual assembly language code bytes generated
for a series of assignments.
Plain integer assignments:
A% = 1234
C7063600D204 MOV WORD PTR [A%],&H04D2
B% = 1234
C7063800D204 MOV WORD PTR [B%],&H04D2
C% = 1234
C7063A00D204 MOV WORD PTR [C%],&H04D2
D% = 1234
C7063C00D204 MOV WORD PTR [D%],&H04D2
E% = 1234
C7063E00D204 MOV WORD PTR [E%],&H04D2
Plain long integer assignments:
V& = 1234
C7064000D204 MOV WORD PTR [V&],&H04D2
C70642000000 MOV WORD PTR [V&+2],0
W& = 1234
C7064400D204 MOV WORD PTR [W&],&H04D2
C70642000000 MOV WORD PTR [W&+2],0
X& = 1234
C7064800D204 MOV WORD PTR [X&],&H04D2
C70642000000 MOV WORD PTR [X&+2],0
Y& = 1234
C7064C00D204 MOV WORD PTR [Y&],&H04D2
C70642000000 MOV WORD PTR [Y&+2],0
Z& = 1234
C7065000D204 MOV WORD PTR [Z&],&H04D2
C70642000000 MOV WORD PTR [Z&+2],0
Assigning multiple integers from another:
A% = 1234
C7063600D204 MOV WORD PTR [A%],&H04D2
B% = A%
A13600 MOV AX,WORD PTR [A%]
A33800 MOV WORD PTR [B%],AX
C% = A%
A33A00 MOV WORD PTR [C%],AX
D% = A%
A33C00 MOV WORD PTR [D%],AX
E% = A%
A33E00 MOV WORD PTR [E%],AX
Assigning multiple long integers from another:
V& = 1234
C7064000D204 MOV WORD PTR [V&],&H04D2
C70642000000 MOV WORD PTR [V&+2],0
W& = V&
A14000 MOV AX,WORD PTR [V&]
8B164200 MOV DX,WORD PTR [V&+2]
A34400 MOV WORD PTR [W&],AX
89164600 MOV WORD PTR [W&+2],DX
X& = V&
A34800 MOV WORD PTR [X&],AX
89164A00 MOV WORD PTR [X&+2],DX
Y& = V&
A34C00 MOV WORD PTR [Y&],AX
89164E00 MOV WORD PTR [Y&+2],DX
Z& = V&
A35000 MOV WORD PTR [Z&],AX
89165200 MOV WORD PTR [Z&+2],DX
The first five statements assign the value 1234 (04D2 Hex) to integer
variables, and each requires six bytes of code. The next five instructions
assign the same value to long integers, taking two such instructions for
a total of 12 bytes for each assignment. Note that a zero is assigned to
the higher word of each long integer, because the full Hex value being
assigned is actually &H000004D2. Simple multiplication shows that the five
integer assignments generates five times six bytes, for a total of 30
bytes. The long integer assignments take twice that at 60 bytes total.
But notice the difference in the next two statement blocks. The first
integer assignment requires the usual six bytes, and the second does as
well. But thereafter, any number of additional integer variables will be
assigned with only three bytes apiece. Likewise, all but the first two
long integer assignments are implemented using only seven bytes each.
Remembering what values are in each register is yet one more optimization
that BASIC performs as it compiles your program.
SHORT CIRCUIT EXPRESSION EVALUATION
Many programming situations require more than one test to determine if a
series of instructions are to be executed or a branch taken. The short
example below tests that a string is not null, and also that the row and
column to print at are legal.
IF Work$ <> "" AND Row <= 25 AND Column <= 80 THEN
LOCATE Row, Column
PRINT Work$
END IF
When this program is compiled with QuickBASIC, all three of the tests are
first performed in sequence, and the results are then combined to see if
the LOCATE and PRINT should be performed. The problem is that time is
wasted comparing the row and column even if the string is null. When speed
is the primary concern, you should test first for the condition that is
most likely to fail, and then use a separate test for the other conditions:
IF Work$ <> "" THEN
IF Row <= 25 AND Column <= 80 THEN
LOCATE Row, Column
PRINT Work$
END IF
END IF
This separation of tests is called *short circuit expression evaluation*,
because you are bypassing--or short circuiting--the remaining tests when
the first fails. Although it doesn't really take BASIC very long to
determine if a string is null, the principle can be applied to other
situations such as those that involve file operations like EOF and LOF.
Further, as you learned in Chapter 3, a better way to test for a non-null
string is IF LEN(Work$) THEN. However, the point is to perform those tests
that are most likely to fail first, before others that are less likely or
will take longer.
Another place where you will find it useful to separate multiple tests
is when accessing arrays. If you are testing both for a legal element
number and a particular element value, QuickBASIC will give a "Subscript
out of range" error if the element number is not valid. This is shown
below.
IF Element <= MaxEls AND Array(Element) <> 0 THEN
Since QuickBASIC always performs both tests, the second will cause an error
if Element is not a legal value. In this case, you *have* to implement the
tests using two separate statements:
IF Element <= MaxEls THEN
IF Array(Element) <> 0 THEN
.
.
END IF
END IF
You may have noticed the I have referred to QuickBASIC here exclusively
in this discussion. Beginning with BASIC 7.0, Microsoft has added short
circuit testing to the compiler as part of its built-in decision making
process. Therefore, when you have a statement such as this one:
IF X > 1 AND Y = 2 AND Z < 3 THEN
BASIC PDS substitutes the following logic automatically:
IF X <= 1 THEN GOTO SkipIt
IF Y <> 2 THEN GOTO SkipIt
IF Z >= 3 THEN GOTO SkipIt
.
.
SkipIt:
Speaking of THEN and GOTO, it is worth mentioning that the keyword THEN
is not truly necessary when the only thing that follows is a GOTO. That
is, IF X < 1 GOTO Label is perfectly legal, although the only savings is
in the program's source code.
This next and final trick isn't technically a short circuit expression
test, but it can reduce the size of your programs in a similar fashion.
Chapter 3 compared the relative advantages of GOSUB routines and called
subprograms, and showed that a subprogram is superior when passing
parameters, while a GOSUB is much faster and smaller. An ideal compromise
in some situations is to combine the two methods.
If you have a called subprogram (or function) that requires a large
number of parameters and it is called many times, you can use a single call
within a GOSUB routine. Since a GOSUB statement generates only three bytes
of code each time it is used, this can be an ideal way to minimize the
number of times that the full CALL is required. Of course, GOSUB does not
accept parameters, but many of them may be the same from call to call.
In particular, some third-party add-on libraries require a long series of
arguments that are unlikely to change. This is shown below.
Row = 10
Column = 20
Message$ = "Slap me five"
GOSUB DisplayMsg
.
.
DisplayMsg:
CALL ManyParams(Row, Column, Message$, MonType, NoSnow, FGColr, BGColr, _
HighlightFlag, VideoMode, VideoPage)
RETURN
In many cases you would have assigned permanent values for the majority
of these parameters, and it is wasteful to have BASIC create code to pass
them repeatedly. Here, the small added overhead of the three assignments
prior to each GOSUB results in less code than passing all ten arguments
repeatedly.
MISCELLANEOUS TIPS AND TECHNIQUES
=================================
There are many tricks that programmers learn over the years, and the
following are some of the more useful ones I have developed myself, or come
across in magazines and other sources.
FORMATTING AND ROUNDING
One frequent requirement in many programs is having control over how
numbers are formatted. Of course, BASIC has the PRINT USING statement
which is adequate in most cases. And Chapter 6 also showed how to trick
BASIC's file handling statements into letting you access a string formatted
by PRINT USING. But there are other formatting issues that are not handled
by BASIC directly.
One problem for many programmers is that BASIC adds leading and trailing
blanks when printing numbers on the screen or to a disk file. The leading
blank is a placeholder for a possible minus sign, and is not added when the
number is in fact negative. Avoiding the trailing blank is easy; simply
use PRINT STR$(Number). And the easiest way to omit the leading blank for
positive numbers is to use LTRIM$: PRINT LTRIM$(STR$(Number)).
PRINT USING is notoriously slow, because examines each character in a
string version of the number, and reformat the digits while interpreting
the many possible options specified in a separate formatting string. But
in many cases all that is really needed is simple right justification. To
right-align an integer value (or series of values) you can use RSET to
assign the numbers into a string, and then print that string as shown
below.
Work$ = SPACE$(10)
REST Work$ = STR$(Number)
PRINT TAB(15); Work$
In this case, Work$ could also have been dimensioned as a fixed-length
string. Adding leading zeros to a number is also quite easy using RIGHT$
like this:
PRINT RIGHT$("00000" + LTRIM$(STR$(Number)), 6)
You will need at least as many zeros in the string as the final result
requires, less one since STR$ always returns at least one digit. Trailing
digits are handled similarly, except you would use LEFT$ instead of RIGHT$.
Rounding numbers is an equally common need, and there are several ways
to handle this. Of course, INT and FIX can be used to truncate a floating
point value to an integer result, but neither of these perform rounding.
For that you should use CINT or CLNG, which do round the number to the
closest integer value. For example, Value = CINT(3.59) will assign 4 to
Value, regardless of whether Value is an integer, single precision, or
whatever.
Some BASICs have a CEIL function, which returns the next *higher* integer
result. That is, CEIL(3) is 3, but CEIL(3.01) returns the value 4. This
function can be easily simulated using Ceil = -INT(-Number).
Rounding algorithms are not quite so simple to implement, as you can see
in the short DEF FN function below.
DEF FnRound# (Value#, Digits%)
Mult% = 10 ^ Digits%
FnRound# = FIX((Mult% * Value#) + (SGN(Value#)) * .5#) / Mult%
END DEF
Another important math optimization is to avoid exponentiation whenever
possible. Whether you are using integers or floating point numbers, using
Number ^ 2 and Number ^ 3 are many times slower than Number * Number and
Number * Number * Number respectively.
STRING TRICKS AND MINIMIZING PARAMETERS
There are a few string tricks and issues worth mentioning here too. The
fastest and smallest way to clear a string without actually deleting it is
with LSET Work$ = "". Another clever and interesting string trick lets you
delete a string with only nine bytes of code, instead of the usual 13.
In Chapter 6 you learned that the assembly language routines within
BASIC's runtime library are accessible if you know their names. You can
exploit that by using the B$STDL (string delete) routine, which requires
less code to set up and call than the more usual Work$ = "". When a string
is assigned to a null value, two parameters--the address of the target
string and the address of the null--are passed to the string assignment
routine. But B$STDL needs only the address of the string being deleted.
You might think that BASIC would be smart enough to see the "" null and
call B$STDL automatically, but it doesn't. Here is how you would declare
and call B$STDL:
DECLARE SUB DeleteStr ALIAS "B$STDL" (Work$)
CALL DeleteStr(Any$)
As with the examples that let you call GET # and PUT # directly, DeleteStr
will not work in the QB environment unless you first create a wrapper
subprogram written in BASIC, and include that wrapper in a Quick Library.
And this brings up an important point. Why bother to write a BASIC
subprogram that in turn calls an internal routine, when the BASIC
subprogram could just as easily delete the string itself? Therefore, the
best solution--especially because it's also the easiest--is to write
DeleteStr in BASIC thus:
SUB DeleteStr(Work$)
Work$ = ""
END SUB
This is an important concept to be sure, because it shows how to reduce
the number of parameters when a particular service is needed many times.
Other similar situations are not hard to envision, whereby multiple
parameters that do not change from call to call can be placed into a
subprogram that itself requires only one or two arguments.
This technique can be extended to several BASIC statements that use more
parameters than might otherwise be apparent. For example, whenever you use
LOCATE, additional hidden parameters are passed to the B$LOCT routine
beyond those you specify. The statement LOCATE X, Y generates 22 bytes of
code, even though other called routines that take two parameters need only
13. (Every passed parameter generates four bytes of code, and the actual
CALL adds five more. This is the same whether the routine being called is
an internal BASIC statement, a BASIC subprogram or function, or an assembly
language routine.) Therefore, if you use LOCATE with two arguments
frequently in a program, you can save nine bytes for each by creating a
BASIC subprogram that performs the LOCATE:
SUB LocateIt(Row, Column) STATIC
LOCATE Row, Column
END SUB
Similarly, if you frequently turn the cursor on and off, you should create
two subprograms--perhaps called CursorOn and CursorOff--that invoke LOCATE.
Since no parameters are required, the savings will add up quickly. Calling
either of the subprograms below generates only five bytes of code, as
opposed to 18 for the statement LOCATE , , 1 and 20 for LOCATE , , 0.
SUB CursorOn STATIC
LOCATE , , 1
END SUB
SUB CursorOff STATIC
LOCATE , , 0
END SUB
The COLOR statement also requires more parameters than the number of
arguments you give. Where COLOR FG, BG generates 22 bytes of compiled
code, CALL ColorIt(FG, BG) creates only 13. CLOSE is yet another BASIC
statement that accepts multiple arguments, and it too requires hidden
parameters. Using CLOSE #X compiles to 13 bytes, and CALL CloseIt(X) is
only nine.
The reason that BASIC sends more parameters than you specify is because
these routines need extra information to know which and how many arguments
were given. In the case of LOCATE, each argument is preceded with a flag
that tells if the next one was given. CLOSE is similar, except the last
parameter tells how many file numbers were specified. Remember, you can
use CLOSE alone to close all open files, or CLOSE 1, 3, 4 to close only
those files numbers. Therefore, BASIC requires some way to tell the CLOSE
statement how many file numbers there are.
Another place where several statements can be consolidated within a
single procedure is when peeking and poking memory. BASIC's PEEK and POKE
are limited because they can access only one byte in memory at a time. But
many useful memory locations are in fact organized as a pair of bytes, as
you will see in Chapter 10. Instead of using code to combine or separate
the bytes each time memory is accessed, you can use the following short
routines that let you peek and poke two bytes at once.
DECLARE FUNCTION PeekWord%(Address%)
PeekWord% = PEEK(Address%) + 256 * PEEK(Address% + 1)
END FUNCTION
DECLARE SUB PokeWord(Address%, Value%)
POKE Address%, Value% AND 255
POKE Address% + 1, Value% \ 256
END SUB
Because these routines use BASIC's PEEK and POKE, you still need to use DEF
SEG separately. Of course, the segment could be added as another
parameter, and assigned within the routines:
DECLARE FUNCTION PeekWord%(Segment%, Address%)
DEF SEG = Segment%
PeekWord% = PEEK(Address%) + 256 * PEEK(Address% + 1)
END FUNCTION
WORD WRAPPING
A string handling technique you will surely find useful is implementing
word wrapping. There are a number of ways to do this, and the following
code shows one that I have found to be very efficient.
DEFINT A-Z
SUB WordWrap (X$, Wide, LeftMargin)
Length = LEN(X$) 'remember the length
Pointer = 1 'start at the beginning of the string
IF LeftMargin = 0 THEN LeftMargin = 1
'Scan a block of Wide characters backwards, looking for a blank. Stop
' at the first blank, or upon reaching the beginning of the string.
DO
FOR X = Pointer + Wide TO Pointer STEP -1
IF MID$(X$, X, 1) = " " OR X = Length + 1 THEN
LOCATE , LeftMargin
PRINT MID$(X$, Pointer, X - Pointer);
Pointer = X + 1
WHILE MID$(X$, Pointer, 1) = " "
Pointer = Pointer + 1
WEND
IF POS(0) > 1 THEN PRINT
EXIT FOR
END IF
NEXT
LOOP WHILE Pointer < Length
END SUB
The WordWrap subprogram expects the text for display to be in a single
long string. You pass it that text, a left margin, and a width. You could
certainly add enhancements to this routine such as a color parameter, or
the ability to format the text and send it to a printer or disk file.
UNUSUAL WAYS TO ACCESS DISPLAY MEMORY
If you ever tried to print a character in the lower-right corner of the
display screen, you probably discovered that it cannot be done [with many
BASIC versions] without causing the screen to scroll up. The only solution
I am aware of is to use POKE to assign the character (and optionally its
color) to display memory directly as shown below.
DEF SEG = &HB800 'use &HB000 for a monochrome display
POKE 3998, 65 'ASCII code for the letter "A"
POKE 3999, 9 'bright blue on black
The second trick also uses display memory in an unconventional manner.
All video adapters contain at least 4096 bytes of on-board memory. Even
though a 25 line by 80 column text mode screen uses only 4000 bytes (2000
characters plus 2000 colors), memory chips are built in multiples of 1,024
bytes. Therefore, you can use the last 96 bytes on the display adapter
in your programs. If the adapter supports multiple video pages, then you
can use the last 96 bytes in each 25-line page.
One use for this memory is to provide a way to communicate small amounts
of information between separate programs. When you don't want to structure
an application to use CHAIN, the only other recourse is to use a disk file
to pass information between the programs. But if all that is needed is a
file name or drive letter, using a file can be awkward and slow, especially
if the program is running from a floppy disk.
One way to access this video memory is with PEEK and POKE. But PEEK and
POKE are awkward too, and can access only one byte at a time. A better
approach is to use an assembly language routine to copy one contiguous
memory block to another location. The MemCopy routine below is designed
to do exactly this.
;MEMCOPY.ASM, copies a block of memory from here to there
.Model Medium, Basic
.Code
MemCopy Proc Uses DS ES SI DI, FromAdr:DWord, ToAdr:DWord, NumBytes:Word
Cld ;copy in the forward direction
Mov SI,NumBytes ;get the address for NumBytes%
Mov CX,[SI] ;put it into CX for copying below
Les DI,FromAdr ;load ES:DI with the source address
Lds SI,ToAdr ;load DS:SI with destination address
Shr CX,1 ;copy words instead of bytes for speed
Rep Movsw ;do the copy
Adc CX,CX ;this will set CX to either 0 or 1
Rep Movsb ;copy the odd byte if necessary
Ret ;return to BASIC
MemCopy Endp
End
MemCopy may be declared and called in two different ways. The first uses
SEG and is most appropriate when you are copying data between variables,
for example from a group of elements in one array to elements in another.
The second lets you specify any arbitrary segment and address, and it
requires the BYVAL modifier either in the DECLARE statement, the CALL, or
both. Each method is shown below.
DECLARE SUB MemCopy(SEG AnyVar1, SEG AnyVar2, Numbytes%)
CALL MemCopy(AnyVar1, AnyVar2, NumBytes%)
DECLARE SUB MemCopy(BYVAL Seg1%, BYVAL Adr1%, BYVAL Seg2%, _
BYVAL Adr2%, NumBytes%)
CALL MemCopy(SourceSeg%, SourceAdr%, DestSeg%, DestAdr%, NumBytes%)
You may also use a combination of these, perhaps with SEG for the source
argument and BYVAL for the second. For example, to copy a 20-byte TYPE
variable to the area just past the end of video memory on a color display
adapter you would do this:
CALL MemCopy(SEG TypeVar, BYVAL &HB800, BYVAL 4000, 20)
In many cases you may need to use MemCopy in more than one way in the same
program. For this reason it is probably better not to declare it at all.
Once a subprogram or function has been declared, BASIC will refuse to let
you change the number or type of parameters. But if you don't include a
declaration at all, you are free to use any combination of SEG and BYVAL,
and also any type of variable.
It is important to understand that numeric and TYPE variables should be
specified using SEG, so MemCopy will know the full address where the
variable resides. You could use a combination of BYVAL VARSEG(Variable)
and BYVAL VARPTR(Variable), but that is not quite as efficient as SEG.
Copying to or from a conventional string using QuickBASIC requires SADD
(string address) instead of VARPTR; far strings in BASIC 7 require SADD,
and also SSEG (string segment) instead of VARSEG.
REBOOTING A PC
Another simple trick that is not obvious to many programmers is how to
reboot a PC. Although most PC technical reference manuals show an
interrupt service for rebooting, that simply does not work with most
computers. However, every PC has a BIOS routine that is at a fixed
address, and which may be called directly like this:
DEF SEG = &HFFFF
CALL Absolute(0)
The Absolute routine is included in thee QB and QBX libraries that come with
BASIC. If a cold boot with the full memory test and its attendant delay
is acceptable, then the code shown above is all that you need. Otherwise,
you must poke the special value &H1234 in low memory as a flag to the BIOS
routine, so it will know that you want a warm boot instead:
DEF SEG = 0
POKE &H473, &H12
POKE &H472, &H34
DEF SEG = &HFFFF
CALL Absolute(0)
INTEGER VALUES GREATER THAN 32K
As you learned in Chapter 2, an integer variable can hold any value between
-32768 and 32767. When this range of numbers is considered, the integer
is referred to as being a signed number. But the same range of values can
also be treated as unsigned numbers spanning from 0 through 65535. Since
BASIC does not support unsigned integers, additional trickery is often
needed to pass values between 32768 and 65535 to assembler routines and DOS
and BIOS services you invoke with CALL Interrupt. One way to do this is
to use a long integer first, and add an explicit test for values higher
than 32767:
Temp& = NumBytes&
IF Temp& > 32767 THEN
IntBytes% = Temp& - 65536
ELSE
IntBytes% = Temp&
END IF
To reverse the process you would test for a negative value:
IF IntBytes% < 0 THEN
NumBytes& = IntBytes% + 65536
ELSE
NumBytes& = IntBytes%
END IF
Although this method certainly works, it is inefficient because of the
added IF testing. When you merely need to pass a variable to a called
routine, you can skip this testing and simply pass the long integer
directly. This may appear counter to the rule that you must always pass
the same type of variable that a subroutine expects. But as long as the
arguments are not being passed by value using BYVAL, this method works and
adds no extra code.
When a parameter is passed to a subprogram or function, BASIC sends the
address of its first byte as shown in Figure 9-1.
─ ─ ─ ┬────┬────┬────┬────┬ ─ ─ ─
│ B1 │ B2 │ B3 │ B4 │
─ ─ ─ ┴────┴────┴────┴────┴ ─ ─ ─
^
└───────── Address passed to the routine
Figure 9-1: Passing a long integer where a regular integer is expected.
Here, B1, B2, and so forth refer to the Bytes 1 through 4 of a long integer
variable. Since the assembly language routine is expecting a regular
integer, it looks at just the first two bytes of the variable. Thus, a
long integer can be used even when a conventional integer is expected. Of
course, any excess greater than 65535 will be ignored by the routine, since
the bits that hold the excess are in the third and fourth bytes.
BENCHMARKING
============
Throughout this book I have emphasized the importance of writing code that
is as small and fast as possible. And these goals should be obvious to all
but the most novice programmer. But it is not always obvious how to
determine for yourself which of several approaches yields code that is the
smallest or fastest. One way is to use Microsoft CodeView, which lets you
count the bytes of assembler code that are generated. This is how I
obtained the byte counts stated throughout this book.
But smaller is not always faster. Further, the code that BASIC
generates is not the whole story. In many cases BASIC makes calls to its
runtime library routines, and you would have to trace through those as well
to know the total byte count for a given statement. It is not impossible
to trace through the BASIC runtime using CodeView, but it certainly can be
tedious. Many of BASIC's internal routines are very convoluted--especially
those that allocate and deallocate string and other memory. Often it is
simpler to devise a test that executes a series of statements many times,
and then time how long the test took.
As an example for this discussion, I will compare two different ways to
print three strings in succession and show how to tell which produces less
code, and which is faster. The first statement below prints each string
separately, and the second combines the strings and then prints them as
one.
1: PRINT X$; Y$; Z$
2: PRINT X$ + Y$ + Z$
Since the length of each string will certainly influence how long it takes
to print them, each of the strings is first initialized to 80 characters
as follows:
X$ = STRING$(80, "X")
Y$ = STRING$(80, "Y")
Z$ = STRING$(80, "Z")
It is important to understand that the PRINT statement itself will be a
factor, since it takes a certain amount of time to copy the characters
from each string to display memory. Worse, if the screen needs to be
scrolled because the text runs past the bottom of the display, that will
take additional time. To avoid the overhead of scrolling, the test program
uses LOCATE to start each new print statement at the top of the screen.
Of course, using LOCATE adds further to the overhead, but in this case much
less than scrolling would. To prove this to yourself, disable the line
that contains the LOCATE statement. Here's the complete benchmark program:
CLS
X$ = STRING$(80, "X") 'create the test string
Y$ = STRING$(80, "Y")
Z$ = STRING$(80, "Z")
Synch! = TIMER 'synchronize to TIMER
DO
Start! = TIMER
LOOP WHILE Start! = Synch!
FOR X = 1 TO 1000 '1000 times is adequate
LOCATE 1
PRINT X$; Y$; Z$
NEXT
Done! = TIMER 'calculate elapsed time
Test1! = Done! - Start!
Synch! = TIMER 'as above
DO
Start! = TIMER
LOOP WHILE Start! = Synch!
FOR X = 1 TO 1000
LOCATE 1
PRINT X$ + Y$ + Z$
NEXT
Done! = TIMER
Test2! = Done! - Start!
PRINT USING "##.## seconds using three strings"; Test1!
PRINT USING "##.## seconds using concatenation"; Test2!
Notice the extra step that synchronizes the start of each test to BASIC's
TIMER function. As you probably know, the PC's system time is updated
approximately 18 times per second. Therefore, it is possible that the test
loop could begin just before the timer is about to be incremented. In that
case the elapsed time would appear to be 1/18th second longer than the
actual time. To avoid this potential inaccuracy, the DO loop waits until
a new time period has just begun. There is still a similar accuracy loss
at the end of the test when Done! is assigned from TIMER. But by
synchronizing the start of the test, the error is limited to 1/18th second
instead of twice that.
When you compile and run this program using QuickBASIC 4.5, it will be
apparent that the first test is more than three times faster than the
second. However, with BASIC 7.1--using either near or far strings--the
second is in fact slightly faster. Therefore, which is better depends on
the version of your compiler, and there is no single best answer. Now
let's compare code size.
The disassemblies shown below are valid for both QuickBASIC 4.5 and
BASIC 7.1. By counting bytes you can see that printing the strings using
a semicolon generates 27 bytes, while first concatenating the strings
requires 29 bytes.
PRINT X$; Y$; Z$
B83600 MOV AX,X$ ;get the address for X$
50 PUSH AX ;pass it on
9AD125FF4A CALL B$PSSD ;print with a semicolon
B83A00 MOV AX,Y$ ;as above for Y$
50 PUSH AX
9AD125FF4A CALL B$PSSD
B83E00 MOV AX,Z$
50 PUSH AX
9AD625FF4A CALL B$PESD ;print with end of line
PRINT X$ + Y$ + Z$
B83600 MOV AX,X$ ;get the address for X$
50 PUSH AX ;pass it on
B83A00 MOV AX,Y$ ;get the address for Y$
50 PUSH AX ;pass that on too
9AD728FF4A CALL B$SCAT ;call String Concatenate
50 PUSH AX ;pass the combined result
B83E00 MOV AX,Z$ ;get the address for Z$
50 PUSH AX ;pass it on
9AD728FF4A CALL B$SCAT ;combine that too
50 PUSH AX ;pass X$ + Y$ + Z$
9AD625FF4A CALL B$PESD ;print with end of line
Even though the first example uses a single PRINT statement, BASIC treats
it as three separate commands:
PRINT X$;
PRINT Y$;
PRINT Z$
The second example that concatenates the strings requires slightly more
code because of the repeated calls to the B$SCAT (string concatenate)
routine. Therefore, if you are using QuickBASIC it is clear that printing
the strings separately is both smaller and faster. BASIC PDS users must
decide between slightly faster performance, or slightly smaller code.
These tests were repeated 1000 times to minimize the inaccuracies
introduced by the timer's low resolution. Since this method of timing can
be off by as much as 1/18th second (55 milliseconds), for test results to
be accurate to 1% the test must take at least 5.5 seconds to complete.
In most cases that much precision is not truly necessary, and other factors
such as the time to use LOCATE will prevent absolute accuracy anyway.
It is important that any timing tests you perform be done after
compiling the program to an .EXE file. The BASIC editor is an interpreter,
and is generally slower than a stand-alone program. Further, the reduction
in speed is not consistent; some statements are nearly as fast as in a
compiled program, and some are much slower.
To obtain more accurate results than those shown here requires some
heavy ammunition; I recommend the Source Profiler from Microsoft. This
is a utility program that times procedure calls within a running program
to an accuracy of one microsecond. The Source Profiler supports all
Microsoft languages including QuickBASIC and BASIC PDS.
To time a program you must compile and link it using the /zi and /co
CodeView switches. This tells BASIC and LINK to add symbolic information
that shows where variables and procedures are located, and also relates
each logical line of source code to addresses in the .EXE file. The Source
Profiler then uses this information to know where each source-language
statement begins and ends.
You should also understand that there's a certain amount of overhead
associated with the timing loop itself. Any FOR/NEXT loop requires a
certain amount of time just to increment the counter variable and compare
it to the ending value. Fortunately, this overhead can be easily isolated,
using an empty loop with the same number of iterations. The short complete
program that follows shows this in context.
Synch! = TIMER
DO
Start! = TIMER
LOOP WHILE Start! = Synch!
FOR X& = 1 TO 50000
NEXT
Done! = TIMER
Empty! = Done! - Start!
PRINT USING "##.## seconds for the empty loop"; Empty!
Synch! = TIMER
DO
Start! = TIMER
LOOP WHILE Start! = Synch!
FOR X& = 1 TO 50000
X! = -Y!
NEXT
Done! = TIMER
Assign! = Done! - Start!
PRINT USING "##.## seconds for the assignments"; Assign!
Actual! = Assign! - Empty!
PRINT USING "##.## seconds actually required"; Actual!
SUMMARY
=======
In this chapter you learned a variety of programming shortcuts and other
techniques. You saw firsthand how it is more efficient to avoid using CHR$
and other BASIC functions repeatedly, in favor a single call ahead of time
when possible. In a similar vein, you can reduce the size of your programs
by consolidating multiple instances of UCASE$, MID$, LTRIM$, and other
functions once before a series of IF tests, rather than use them each time
for each test.
You also learned that assigning multiple variables in succession from
another often results in smaller code than assigning from the same numeric
constant. Short circuit expression evaluation was described, and examples
showed you how that technique can improve the efficiency of a QuickBASIC
program. But since BASIC PDS already employs this optimization, multiple
AND conditions are not needed when using that version of compiler.
This chapter explained the importance of reducing the number of
parameters you pass to a subprogram or function, and showed how you can use
GOSUB to invoke a central handler that in turn calls the routine.
Likewise, when using BASIC statements such as LOCATE, COLOR, and CLOSE that
require additional arguments beyond those you specify, a substantial amount
of code can be saved by creating a BASIC subprogram wrapper. Examples for
turning the cursor on and off were shown, and these can save 13 and 15
bytes per use respectively.
Several programming techniques were shown, including a word wrap
subprogram, a numeric rounding function, and a simple way to reboot the PC.
You also learned how small amounts of data can be safely stored in the last
96 bytes of video memory, perhaps for use as a common data area between
separately run [non-chained] programs.
Finally, this chapter included a brief discussion of some of the issues
surrounding benchmarking, and explained how to obtain reasonably accurate
statement timings. To determine the size of the compiler-generated code
requires disassembling with CodeView.
Chapter 10 continues with a complete list of key addresses in low memory
you are sure to find useful, and discusses each in depth along with
accompanying examples.